ITCH: Information-Theoretic Cluster Hierarchies
نویسندگان
چکیده
Hierarchical clustering methods are widely used in various scientific domains such as molecular biology, medicine, economy, etc. Despite the maturity of the research field of hierarchical clustering, we have identified the following four goals which are not yet fully satisfied by previous methods: First, to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. Second, to represent each cluster in the hierarchy by an intuitive description with e.g. a probability density function. Third, to consistently handle outliers. And finally, to avoid difficult parameter settings. With ITCH, we propose a novel clustering method that is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL), referred to as hMDL. Interpreting the hierarchical cluster structure as a statistical model of the data set, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals.
منابع مشابه
Hierarchical Information Clustering by Means of Topologically Embedded Graphs
We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster...
متن کاملNonlinear Schrödinger Equations for Identical Particles and the Separation Property
We investigate the separation property for hierarchies of Schrödinger operators for identical particles. We show that such hierarchies of translation invariant second order differential operators are necessarily linear. A weakened form of the separation property, related to a strong form of cluster decomposition, allows for homogeneous hierarchies of nonlinear differential operators. Some conne...
متن کاملHierarchical Information-theoretic Co-clustering for High Dimensional Data
Hierarchical clustering is an important technique for hierarchical data exploration applications. However, most existing hierarchial methods are based on traditional one-side clustering, which is not effective for handling high dimensional data. In this paper, we develop a partitional hierarchical co-clustering framework and propose a Hierarchical Information-Theoretical Co-Clustering (HITCC) a...
متن کاملAn information theoretic approach to hierarchical clustering combination
In Hierarchical Clustering, a set of patterns are partitioned into a sequence of groups represented as a dendrogram. The dendrogram is a tree representation where each node is associated with merging of two (or more) partitions and hence each partition is nested into the next partition. Hierarchical representation has properties that are useful for visualization and interpretation of clustering...
متن کاملGALOIS: An Order-Theoretic Approach to Conceptual Clustering
The theory of concept (or Galois) lattices provides a natural and formal setting in which to discover and represent concept hierarchies. In this paper we present a system, GALOIS, which is able to determine the concept lattice corresponding to a given set of objects. GALOIS is incremental and relatively efficient, the time complexity of each update ranging from O(n) to O(n2) where n is the numb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010